Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations
نویسندگان
چکیده
We present a method for finding oracle BLEU translations in phrase-based statistical machine translation using exact document-level scores. Experiments are presented where the BLEU score of a candidate translation is directly optimised in order to examine the properties of reachable translations with very high BLEU scores. This is achieved by running the documentlevel decoder Docent in BLEU-decoding mode, where proposed changes to the translation of a document are only accepted if they increase BLEU. The results confirm that the reference translation cannot in most cases be reached by the decoder, which is limited by the set of phrases in the phrase table, and demonstrate that high-BLEU translations are often of poor quality.
منابع مشابه
Climbing Mount BLEU: The Strange World of Reachable High-BLEU Translations
We present a method for finding oracle BLEU translations in phrase-based statistical machine translation using exact document-level scores. Experiments are presented where the BLEU score of a candidate translation is directly optimised in order to examine the properties of reachable translations with very high BLEU scores. This is achieved by running the documentlevel decoder Docent in BLEU-dec...
متن کاملOptimizing for Sentence-Level BLEU+1 Yields Short Translations
We study a problem with pairwise ranking optimization (PRO): that it tends to yield too short translations. We find that this is partially due to the inadequate smoothing in PRO’s BLEU+1, which boosts the precision component of BLEU but leaves the brevity penalty unchanged, thus destroying the balance between the two, compared to BLEU. It is also partially due to PRO optimizing for a sentence-l...
متن کاملBlues for BLEU: Reconsidering the Validity of Reference-Based MT Evaluation
This article describes experiments a set of experiments designed to test whether reference-based machine translation evaluation methods (represented by BLEU) (a) measure translation “quality” and (b) whether the scores they generate are reliable as a measure for systems (rather than for particular texts). It considers these questions via three methods. First, it examines the impact of changing ...
متن کاملCorrelating Translation Product and Translation Process Data of Professional and Student Translators
The paper presents an exploratory study of the translation processes for 12 student and 12 professional translators. We relate properties of the translators’ process data (eye movements and keystrokes) with the quality of the produced translations, using BLEU scores and human evaluation scores for fluency and accuracy to assess translation quality. We also investigate how BLEU scores correlate ...
متن کاملBllip: An Improved Evaluation Metric for Machine Translation
In this paper we present a new automatic scoring method for machine translations. Like the now-traditional BLEU score it maps a proposed translation and a set of reference translations to a real number. This number is intended to reflect the quality of the proposed translation. We present some experiments that indicate that this new metric, the Bllip score (Brown Laboratory for Linguistic Infor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016